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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

METHOD FOR INSPECTING DOCUMENT IMAGES 
BACKGROUND OF THE INVENTION 

Field of the Invention 

[0001] The present invention relates to electronic image capture and analysis, 
and specifically to the capture and analysis of bank documents such as checks, deposit 
slips, withdrawal requests, and other transaction records. 

Description of Related Art 

[0002] The current economic environment requires the efficient processing of 
bank documents such as, checks, deposit slips, withdrawals, and other transactions. 
These documents are processed by creating electronic digital images, where the 
resulting images are archived, replacing microfilm repositories. It is important that the 
captured images by of "acceptable" quality when saved, as subsequent processing, 
whether it be for record-keeping, dispute, research, documentation or a large number of 
other actions, is dependent on the ability to reproduce the original document from the 
saved image. 

[0003] Recently, the ability to use digital images for transaction processing has 
advanced to the point where saved images are as valid as the original document. On 
June 25, 2003, the United States Congress passed the Check Truncation Act of 2003 to 
"facilitate check truncation by authorizing substitute checks, to foster innovation in the 
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check collection system without mandating receipt of checks in electronic form, and to 
improve the overall efficiency of the Nation's payments system." In this Act the term 
"truncate" means "to remove an original paper check from the check collection or return 
process and send to a recipient, in lieu of such original paper check, a substitute check 
or, by agreement, information relating to the original check (including data taken from 
the MICR line of thee original check or an electronic image of the original check), 
whether with or without subsequent delivery of the original paper check." 

[0004] In an environment where the original document is kept only as the 
electronic facsimile, and is the basis for "substitute" documents, accepted as the 
original, image quality is vitally important. 

[0005] Current art includes the mechanical and electronic scanning and capture 
of bank documents (checks, deposit slips, etc.), where a document is moved beneath a 
light source and magnetic reading device (for MICR characters). The captured image is 
then processed by hardware and software systems to collect, examine, process, store, 
and label the document image. Such systems are prone to physical, optical, 
mechanical, and environmental conditions that lead to poor image quality. For example, 
dirt or dust on the document or on the optical light source or lens; poorly focused optical 
equipment; loose or defective cables or components; analog or digital or conversion 
failures; software failures; data path errors; or damaged, folded, torn, or perforated 
documents; or documents that have too much or too little contrast between the 
document background and the written portions cause poor image quality. Documents 
may also be of poor quality due to skew, rotation or inversion of the document, ink blots, 
finger prints, and stains of many origins. It is therefore important to detect and where 
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possible correct poor image quality, and when not corrected, identify and process the 
document again or by another method to collect a quality document image. 

[0006] It is important to differentiate between areas of the document, and decide 
on the quality of the image based on the overall quality detected. In the case of bank 
documents, some areas of the document are more important than others. For example, 
a check with overall poor quality may be usable if the quality of the payee, amount, and 
date areas of the check are acceptable. In processing a check, it is not overall image 
quality that is important. A check may have acceptable quality while the quality within 
the vital check areas, legal amount or other text information on a bank document, may 
independently be unacceptable. Bank documents have different vital image areas, for 
example, a check has different vital data when compared to a deposit slip. Current 
processing methods do not allow for the processing of different types of bank 
documents and processing different vital data positions within each type of bank 
document. 

[0007] A need has arisen for a high speed image capture system which inspects 
each image and determines whether the image is acceptable based upon the quality of 
areas of the image rather than overall quality of the image. A need has further arisen 
for a method to examine payee, legal amount, and other text areas of a check and 
determine a quality confidence value for use in determining acceptable image quality. 

SUMMARY OF THE INVENTION 
[0008] A method of determining acceptable bank document quality is disclosed 
in which a digital image of the document is examined, the vital document areas located, 
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and a quality confidence calculated. A determination of acceptable image quality is 
made using the calculated confidence values for the examined areas using two or more 
confidence techniques. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0009] For a more complete understanding of the present invention and for 
further advantages thereof, reference is now made to the following Description of the 
Preferred Embodiments taken in conjunction with the accompanying Drawings in which: 

FIG. 1 is a block diagram illustrating the steps of the present method; and 

FIG. 2 is a block diagram illustrating the steps of an alternative method of the 
present method. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[0010] The present invention processes bank documents as digital images, 
examines document images, assigns a "confidence level" by examining two or more 
areas of the document image and calculates a confidence level for each area. The 
confidence level for each document image is then assigned as a combination of the 
separately calculated area confidence levels. 

[0011] Referring to FIG. 1, the present method 100 begins with a file 102 of 
compressed document images. These images result from the output of the mechanical 
and electronic scanning of the document original. Any number of devices and 
techniques are used to collect these images. Devices such as the IBM 3897 Image 
Camera or NCR 7780 Check Sorter, provide such document image files. Each 
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document image is examined in the following steps. The image from the image file 102 
is decompressed at step 104 using decompression techniques according to the 
compression scheme used. Since many bank documents have a "safety pattern" on the 
document surface making it difficult to forge, the next step 106 is to digitally remove this 
pattern from the image. At step 106, the image is converted if necessary from grey- 
scale to black and white. In grey-scale, each pixel is assigned a number from 0 
indicating white to some maximum, for example 255, indicating black. In step 106, all 
pixels with a grey-scale value less than some threshold, for example, 128, are changed 
to white, and those above the threshold are changed to black. 

[0012] The payee text is located at step 108 by first locating the payee line on 
the check. If the line is found, all of the characters within a certain distance above the 
line (based on the document type) are located. The Payee Text Confidence (PTC) is 
calculated at step 1 10 by dividing the total number of pixels in the discovered characters 
by the expected value of the total number of pixels in the payee text area. For example, 
if the total number of pixels in the characters identified is 1 ,000 and the expected value 
is 2,000, then the confidence is 1,000 / 2,000 = 0.5. If the calculation equals a 
confidence level greater than 1 .0, then 1 .0 is assigned. The expected number of pixels 
is empirically determined by examining a number of known acceptable and 
unacceptable document images. Once set, this expected number is used for all 
documents of a document type. 

[0013] The legal amount text is located at step 112 in the same way as the 
payee text, all the characters above a line based on document type are located, for 
example, within 0.5 cm of the line for checks. The Legal Amount Text Confidence 
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(LTC) is calculated at step 1 14 by dividing the total number of pixels in the legal amount 
text area by the expected value of the total number of pixels in the legal amount text 
area. 

[0014] All text in the image is located at step 116 by identifying groups of 
touching pixels, called a "pixel group." An examination of the pixel groups then 
identifies characters made up of pixel groups (similar to the technology of optical 
character recognition). 

[0015] The All Text Confidence (ATC) is calculated at step 118 using the 
following formula: 

ATC = 1 .0 - (totalPixelGroups - totalCharacters)/total Characters 
[0016] If the calculation results in a negative number (in the case that the total 
number of pixel groups is greater than twice the total number of characters), ATC is set 
to zero. 

[0017] An image profile is created at step 120 by counting the number of black 
pixels in each row (horizontal) of the document. A Profile Confidence (PC) is calculated 
at step 122. The PC is calculated in the following steps: 

1 . The mean number of black pixels per row is calculated from the total 
number of black pixels, divided by the number of rows. 

2. The variance and standard deviation of the distribution of black pixels 
in each row are calculated. 
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3. The black pixel density is computed by counting the number of black 
pixels in a fixed image area and dividing by the total number of pixels 
in that same area. 



4. The Profile Confidence (PC) is determined by selecting the smaller of 
the standard deviation (std Dev) and pixel density (pDensity) 
measurement. Both values result from measurements of how well the 
standard deviation and the black pixel density fall within default ranges 
determined by experimentation or through user setting values. 



stdDevFactor = (max allowable Std Deviation - min allowable Std 
Deviation) / 2 

if ( Standard Deviation <= stdDevFactor ) 

stdDevConf = Standard Deviation * 100 / stdDevFactor 

else 

stdDevConf = 200 - (stdDev * 100 / stdDevFactor) 

pDensityFactor = (max allowable pixel Density - min allowable pixel 
Density ) / 2 

if ( pixel Density <= pDenistyFactor ) 

pelDensityConf = (Pixel Density * 100 / pelDensityFactor) 

else 

pelDensityConf = 200 - (Pixel Density * 100 / 
pelDensityFactor); 

if ( stdDevConf < 0 ) 

stdDevConf = 0; 
if (pelDensityConf < 0) 

pelDensityConf = 0; 

if ( stdDevConf <= pelDensityConf ) 
PC = pelDevConf 

else 

PC = pelDensityConf 
[0018] The image confidence (IC) is calculated at step 124 using the formula: 
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IC = PTC * LTC * ATC * PC 
Where PTC is the payee text confidence, LTC is the legal amount text confidence, ATC 

is the all text confidence, and PC is the profile confidence. The Image Confidence is 

compared to a threshold for the document type of the image at step 126. If the IC is 

greater than or equal to the threshold, then the image is acceptable and no further 

processing occurs in step 132. If the IC is less than the threshold value for the 

document types, then the digital image is visually examined by a human operator for 

quality in the vital information area at step 128. If the examiner determines that the 

image is acceptable at step 130, then no further processing is done. 

[0019] If the examiner determines that the image is unacceptable at step 130, 
then the original physical document is located at step 134, rescanned at step 136. The 
document may be modified, for example by removing dirt, before scanning. The 
document may be scanned using different equipment (for example, higher resolution) or 
different scanning parameters (for example light intensity or contract). The rescanned 
image is visually inspected by an operator at step 138 where further modification and 
adjustments to the image may be made before the document image is placed in a 
replacement image file at step 140 for subsequent processing beginning at the step 
102. 

[0020] FIG. 2 illustrates a second embodiment of the present method. The 
method 200 beings with images in compressed digital form in an image file 202. Each 
image in turn is read from the file and decompressed at step 204. Any safety pattern is 
removed from the image at step 206. An image profile is constructed at step 208 as a 
histogram, taking each row of the image and counting the number of black pixels. This 
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process includes removing any skew from the image before constructing the profile. 
From the document profile, a Profile Confidence (PC) is calculated at step 210 using the 
formula described in the first embodiment step 122. 



[0021] With the Profile Confidence calculated, the text fields of the document 
image are located at step 212 according to the document type and each Field 
Confidence (FC) is initialized to the Profile Confidence. The text fields are different for 
each document type. For example, text fields in a check include a payee name, legal 
amount, courtesy amount, date, and signature, while a deposit slip text fields include 
legal amount, courtesy amount, date, and signature. For each document text field, the 
field characters are located at step 214 and the Field Confidence is altered based on 
the mass of each character within the field. The Field Confidence level (FC) is updated 
at step 216 for each character mass (CM) confidence level according to the following: 



MinCM = constant 

FC = Profile Confidence 

For each character 

CM = number of black pixels in the character 
if CM<=MinCM then 

FC = FC (unchanged) 
if CM>MinCM and CM<=2*MinCM then 

FC = FC * (1 .0 - (CM-MinCM) /MinCM) 
if CM >2*MinCM then 
FC = 0 

Next character 

For each character between the minimum character mass (MinCM) and twice the 
minimum character mass (2*MinCM), the Field Confidence (FC) is reduced by the 
percentage that the character mass is greater than the minimum character mass. If any 
character is more than twice the minimum character mass, the Field Confidence FC is 
zero. 
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[0022] The Field Confidence is updated based on the number of "broken 
character" in the field at step 218, where a broken character is one in which parts of the 
character have dropped out, resulting in a character that is made up of several pixel 
groups. The field confidence FC is reduced by the formula: 

FC-FC*(1.0-BC/TC) 

Where BC is the number of broken characters and TC is the total 
number of characters in the field. 

[0023] All pre-printed lines are located at step 220 using, for example, the 
Hough Transform or other well-known techniques that locate collinear line segments. 
Each text field is then examined for a pre-printed line. If a line is found, then all 
characters above the line to a distance (determined by the document type) are located. 
The Line Area Confidence (LAC) is calculated as the ratio of the number of pixels in the 
characters found above the line, divided by a predetermined expected number of pixels. 
The field confidence, FC, is then updated at step 222 by: 

FC = FC * LAC 

Where LAC is the Line Area Confidence 
[0024] The overall confidence for the image is set as the minimum of the Field 
Confidence values for all text fields at step 224. The overall confidence value is 
compared to a threshold for the document type of the image at step 226. If the value is 
greater than or equal to the threshold, then the image is acceptable and no further 
processing occurs. If the value is less than the threshold value for the document type, 
then the digital image is visually examined by a human operator for quality in the vital 
information areas at step 228. If the examiner determines that the image is acceptable 
at step 230, then no further processing is done. 
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[0025] If the examiner determines that the image is unacceptable at step 230, 
then the original physical document is located at step 232, and rescanned at step 234. 
The document may be modified, for example by removing dirt, before scanning. The 
document may be scanned using different equipment (for example, higher resolution) or 
different scanning parameters (for example light intensity, color, or contrast). The 
rescanned image is visually inspected by an operator at step 236 where further 
modification and adjustments to the image may be made before the document image is 
saved at step 238. The image is then placed in a replacement image file 240 for 
subsequent processing beginning at the step 202. 

[0026] In the present method a document image is assigned a confidence level 
by examining portions of the document. The higher the calculated confidence level, the 
better the document image is determined to be. Low confidence level documents are 
processed first by a human operator, then by rescanning the document until an 
acceptable image is captured. The confidence level emphasizes fields within the 
document so that if the vital document fields are acceptable, the document image as a 
whole is deemed acceptable. 

[0027] It therefore can be seen that the present method determines an image to 
be of acceptable quality by considering the quality in vital areas of the document, and 
not the quality of the document as a whole. The method allows a document to be 
accepted if the vital areas images are acceptable, even though the whole document 
may be unacceptable. The method also would deem unacceptable a document where 
the vital areas are unacceptable even when the whole document image is acceptable 
quality. The present method determines images to be unacceptable where the poor 
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quality is caused by the writing instrument, pen, or printer, or when the document 
background (for example, an artistic scene on a check) makes the vital document area 
content unacceptable as characters. 

[0028] Other alteration and modification of the invention will likewise become 
apparent to those of ordinary skill in the art upon reading the present disclosure, and it 
is intended that the scope of the invention disclosed herein be limited only by the 
broadest interpretation of the appended claims to which the inventor is legally entitled. 
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